Thinking about Light Curves

This is an update to the first version of lightCurveAnalysis. In this one, we do more time ranges and explicitly hold off some test points from the training. The same procedure applies here, though. We’ll make some truth functions, sample the functions, and represent truth with Gaussian Processes. Then we’ll draw samples from those GPs as the input to DLDT images. Those will get fed into a DNN to do classification on some test points. Fun.

Generate Truth Signals

We’re going to pick some basic functions to serve as our “truth” data. There’s no reason behind the selection of these functions other than that they’re sort of similarly shaped and scaled and with enough noise there might be some overlap. Look in “helperFunctions.R” to see the definitions of the functions. The truth data for the functions is plotted below.

require(tibble)
## Loading required package: tibble
require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(magrittr)
## Loading required package: magrittr
require(ggplot2)
## Loading required package: ggplot2
require(tidyr)
## Loading required package: tidyr
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
## 
##     extract
require(purrr)
## Loading required package: purrr
## Warning: package 'purrr' was built under R version 3.5.3
## 
## Attaching package: 'purrr'
## The following object is masked from 'package:magrittr':
## 
##     set_names
typeNames = c('truth1','truth2','truth3','truth4')
numTypes = length(typeNames)
timeVals = 0:100
numTimes = length(timeVals)
                  
process1TruthTable <- createTruthData(timeVals,typeNames,createProcess1Funcs(numTypes))
## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.
process1TruthTable %>% ggplot(aes(x=time,y=value,color=func)) + geom_point() + geom_line()

As before, we add complexity to the problem by purposefully losing track of the functions and working purely with sampled data. To do that, we randomly sample the functions, add Gaussian noise, and learn a Gaussian Process over the noisy samples. The Gaussian Process allows for variable uncertainty depending on how close a requested sample is to a training point and the ability to freely sample at non-prespecified times.

We plot the sampled points with and without noise below.

numToKeep = 50
sampleNoise = 0.01
sampledTable <- downSampleTruthPoints(process1TruthTable,numToKeep,typeNames,timeVals)
sampledTable %<>% mutate(noiseValue = value + rnorm(n(),0,sampleNoise))
sampledTable %>% ggplot(aes(x=time,y=value,color=func)) + geom_point() + geom_line()

sampledTable %>% ggplot(aes(x=time,y=noiseValue,color=func)) + geom_point()

Fit a Gaussian process to the sampled data. Create maximum likelihood predictions at all of the original time points to show that the GPs are doing what they’re supposed to. The prediction points also includes error bars. Depending on the points used and the noise added to the samples difference across the range may be visible.

process1GPs <- fitGaussianProcesses(typeNames,sampledTable)
## Loading required package: GauPro
## Warning: package 'GauPro' was built under R version 3.5.3
gpPredictions <- createGaussianProcessPredictions(process1GPs,typeNames,timeVals)
gpPredictions %>% ggplot(aes(x=time,y=mean,color=type)) + geom_line() + geom_linerange(aes(ymin=lower,ymax=upper)) 

Now we generate samples from the GPs to serve as our training data. We randomize the times that are used in each sample through use of a set of bins. These bins represent different windows of the process, allowing different amounts of information into the final sample set. From these random time profiles we generate the observed profiles. Through the way we sample from the GP, noise is already baked in. We’ll use the same set of random time points for all of our types. In a perfect world, we would randomize across this, but for the purposes of this experiment, it shouldn’t matter too much.

In the final output table, data is organized by trial and truth type.

maxTimesForBins = c(20,40,60,80,100)
numProfilesPerTimeBin = c(30,30,30,30,30)
numPointsPerProfileRanges = list(4:10,8:20,12:30,16:40,20:50)


process1TrainingData <-
  generateSamples(process1GPs,
                  typeNames,
                  maxTimesForBins,
                  numProfilesPerTimeBin,
                  numPointsPerFrofileRanges)
require(gganimate)
## Loading required package: gganimate
## Warning: package 'gganimate' was built under R version 3.5.3
require(gifski)
## Loading required package: gifski
## Warning: package 'gifski' was built under R version 3.5.3
process1TrainingDataPlot <- process1TrainingData %>% mutate(trial=as.factor(trial))  %>% ggplot(aes(x=time,y=value,color=type)) + geom_point() + transition_states(trial,transition_length=1,state_length=2) + labs(title='Trial: {closest_state}')
animate(process1TrainingDataPlot,nframes=3*sum(numProfilesPerTimeBin))

Now convert all of the samples into DValDt data. We’ll create these graphs and then the images in two steps to show the process.

totalProfilesPerType = sum(numProfilesPerTimeBin)
process1TrainingDldtData <-
  convertSamplesToDValDt(process1TrainingData,typeNames,totalProfilesPerType)
displayTrials = seq(from=5,to=totalProfilesPerType,by=5)

process1TrainingDlDtPlot <- 
  process1TrainingDldtData %>% filter(trial %in% displayTrials) %>%
  ggplot(aes(x=time,y=value,color=type)) + geom_point() + ylim(c(-2,2)) + 
  facet_wrap(~type) + transition_states(trial,transition_length=1,state_length=2) +
  labs(title='Trial: {closest_state}')

animate(process1TrainingDlDtPlot,nframes=3*totalProfilesPerType/5)

And now we create images. There are a couple of parameters that have to get set for the binning process that are defined by the algorithm. First is “limits”“, a 2x2 matrix that has the x-range in teh first row and the y-range in the second. Then”bins“, a 2 element array that contains the number of x-bins and the number of y-bins. We use a 16x16 image since this seems to fit the shape of the data well and is reasonably small for later processing by our DBN.

require(ash)
## Loading required package: ash
require(imager)
## Loading required package: imager
## 
## Attaching package: 'imager'
## The following object is masked from 'package:tidyr':
## 
##     fill
## The following object is masked from 'package:magrittr':
## 
##     add
## The following objects are masked from 'package:stats':
## 
##     convolve, spectrum
## The following object is masked from 'package:graphics':
## 
##     frame
## The following object is masked from 'package:base':
## 
##     save.image
limits <- matrix(c(0,-2,100,2),2,2)
xPixels = 16
yPixels = 16
bins <- c(xPixels,yPixels)
numPixels = xPixels * yPixels

process1DldtTrainingImages <- createImages(process1TrainingDldtData,
                                           typeNames,
                                           totalProfilesPerType,
                                           limits,bins)
process1DldtTrainingImagesPlot <- 
  process1DldtTrainingImages %>% filter(trial %in% displayTrials) %>% 
  ggplot(aes(x,y)) + geom_raster(aes(fill=value)) + facet_wrap(~type) +
  transition_states(trial,transition_length=1,state_length=2) + 
  labs(title='Trial: {closest_state}')
  
animate(process1DldtTrainingImagesPlot,nframes=3*totalProfilesPerType/5)  

Analyze Images

The nice thing about converting to the images is that we now have a regular size across all samples to either do visualization or comparisons. Here we’ll take a slight detour and put all the points into tsne and see what happens.

Note that tsne requires all the points to be unique, so we have to test for and remove duplicates. We’ll tolerate this since this is just for display, although it tells us that maybe we should sample across different times in the future.

require(Rtsne)
## Loading required package: Rtsne
## Warning: package 'Rtsne' was built under R version 3.5.3
process1TrainingImagesInRows <- widenImageDataFrame(process1DldtTrainingImages,numPixels)
tsneTrainingImagesOutput <- getTSNEEmbedding(process1TrainingImagesInRows,numPixels)
## Warning: `as.tibble()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
require(ggforce)
## Loading required package: ggforce
## Warning: package 'ggforce' was built under R version 3.5.3
tsneTrainingImagesOutput %>% ggplot(aes(x=V1,y=V2,color=type)) + geom_point()

#try a second one with some ellipses
tsneTrainingImagesOutput %>% ggplot(aes(x=V1,y=V2,color=type)) + geom_mark_ellipse(aes(fill=type)) + geom_point()

#this makes an acyclical graph plot of the points.
ggplot(tsneTrainingImagesOutput, aes(x=V1,y=V2)) +
  geom_delaunay_tile(alpha = 0.3) + 
  geom_delaunay_segment2(aes(colour = type, group = -1), size = 1,
                         lineend = 'round')

process1TsneTrainingAnime <- 
  tsneTrainingImagesOutput %>% ggplot(aes(x=V1,y=V2,color=type)) + 
  geom_point(aes(size=2)) +
  transition_states(trial,transition_length=1,state_length=2) +
  labs(title='Trial: {closest_state}') + shadow_mark(alpha = 0.6, size = 1)

animate(process1TsneTrainingAnime,nframes=3*totalProfilesPerType)

Learn Classifier

Having made some pictures, we begin to learn a model. We’ve already widened the data into a matrix that is appropriate for using in an NN. We attempt that here. The charts below show how we self-test on the training data. Charts 1 and 3 are based on truth data. There are 150 trials for each type, how many of those did we get right. Chart 1 is overall and Chart 3 is based on time. Charts 2 and 4 are based on predictions. How many predictions of a given type were correct. By combining the two, we can see how we misclassified things. I will look into making an alluvial plot for this as well.

#A function accomplishes all of the work of training the model, and knows something about the structure we want to use, so we send in our data
require(keras)
## Loading required package: keras
require(tensorflow)
## Loading required package: tensorflow
#install_keras()

process1KerasModel <- trainKerasModel(process1TrainingImagesInRows)
#We'll test our training data here.
process1SelfTestPrediction <-
  testKerasModel(process1TrainingImagesInRows,process1KerasModel)
process1SelfTestPrediction %>% ggplot(aes(truth)) + geom_bar(aes(fill=correct))

process1SelfTestPrediction %>% ggplot(aes(predict)) + geom_bar(aes(fill=correct))

trialBins <- c(31,61,91,121,151)
process1SelfTestPrediction %<>% mutate(trialBin = case_when(trial<trialBins[1] ~ 1,
                                                         trial<trialBins[2] ~ 2,
                                                         trial<trialBins[3] ~ 3,
                                                         trial<trialBins[4] ~ 4,
                                                         trial<trialBins[5] ~ 5))

process1SelfTestPrediction %>% ggplot(aes(truth)) + geom_bar(aes(fill=correct)) + facet_wrap(~trialBin) 

process1SelfTestPrediction %>% ggplot(aes(predict)) + geom_bar(aes(fill=correct)) + facet_wrap(~trialBin)

#require(ggalluvial)

#Not using this style
# ggplot(process1Alluvial3,
#        aes(y = value, axis1=truth,axis2 = trialBin, axis3=predict)) +
#   geom_alluvium(aes(fill = correct), width = 1/12) +
#   geom_stratum(width = 1/12, fill = "black", color = "grey") +
#   geom_label(stat = "stratum", label.strata = TRUE)



process1TrainingAlluvial <- process1SelfTestPrediction %>%
  mutate(truth=as.factor(truth),
         predict=as.factor(predict),
         trialBin=as.factor(trialBin)) %>%
  group_by(truth, predict,trialBin) %>%
  summarise(n()) %>% rename(value = 'n()') %>%
  mutate(correct=ifelse(truth==predict,'good','bad')) %>% 
  gather_set_data(c(1:3))
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
process1TrainingAlluvial %>% ggplot(aes(x, id = id, split = y, value = value)) +
  geom_parallel_sets(aes(fill = correct), alpha = 0.3, axis.width = 0.1) +
  geom_parallel_sets_axes(axis.width = 0.1) +
  geom_parallel_sets_labels(colour = 'white')

Test Classifier

To test our model, we create new samples that weren’t in the original set. We will make a fair amount less points than the original, since we don’t need to train a model.

numProfilesPerTimeBin_test = c(5,5,5,5,5)


process1TestDataTable <- generateSamples(process1GPs,
                                         typeNames,
                                         maxTimesForBins,
                                         numProfilesPerTimeBin_test,
                                         numPointsPerFrofileRanges)
#A little side number we need to store for later is the length and index of the longest trial in our training set.

maxLengthTestTrial <- process1TestDataTable %>% filter(type==typeNames[1]) %>% 
  group_by(trial) %>% summarise(n()) %>% top_n(n = 1)
## Selecting by n()
process1TestDataPlot <- 
  process1TestDataTable %>% mutate(trial=as.factor(trial))  %>%
  ggplot(aes(x=time,y=value,color=type)) + geom_point() + 
  transition_states(trial,transition_length=1,state_length=2) + 
  labs(title='Trial: {closest_state}')

animate(process1TestDataPlot,nframes=3*sum(numProfilesPerTimeBin_test))

Create DLDT’s. We won’t show this data, since it’s similar to above. Only the Images.

totalProfilesPerType_test = sum(numProfilesPerTimeBin_test)
process1TestdldtData <- convertSamplesToDValDt(process1TestDataTable,
                                               typeNames,
                                               totalProfilesPerType_test)

Here we create the test images, and show them.

process1DldtTestImages <- createImages(process1TestdldtData,
                                        typeNames,
                                       totalProfilesPerType_test,
                                       limits,
                                       bins)
process1DldtTestImages %>% ggplot(aes(x,y)) + geom_raster(aes(fill=value)) +
  facet_wrap(~type) + transition_states(trial,transition_length=1,state_length=2) +
  labs(title='Trial: {closest_state}')

Organize the test data into Row vectors for application to the DBN.

process1TestImagesInRows <- widenImageDataFrame(process1DldtTestImages,numPixels)

process1OutOfSampleTestPredictionTable <- testKerasModel(process1TestImagesInRows,
                                                         process1KerasModel)
process1OutOfSampleTestPredictionTable %>% ggplot(aes(truth)) + 
  geom_bar(aes(fill=correct))

process1OutOfSampleTestPredictionTable %>% ggplot(aes(predict)) +
  geom_bar(aes(fill=correct))

trialBins <- c(6,11,16,21,26)
process1OutOfSampleTestPredictionTable %<>% 
  mutate(trialBin = case_when(trial<trialBins[1] ~ 1,
                              trial<trialBins[2] ~ 2,
                              trial<trialBins[3] ~ 3,
                              trial<trialBins[4] ~ 4,
                              trial<trialBins[5] ~ 5))

process1OutOfSampleTestPredictionTable %>% ggplot(aes(truth)) + 
  geom_bar(aes(fill=correct)) + facet_wrap(~trialBin) 

process1OutOfSampleTestPredictionTable %>% ggplot(aes(predict)) +
  geom_bar(aes(fill=correct)) + facet_wrap(~trialBin)

Remake the Sankey plots for the test data.

process1TestAlluvial <- process1OutOfSampleTestPredictionTable %>%
  mutate(truth=as.factor(truth),
         predict=as.factor(predict),
         trialBin=as.factor(trialBin)) %>%
  group_by(truth, predict,trialBin) %>%
  summarise(n()) %>% rename(value = 'n()') %>%
  mutate(correct=ifelse(truth==predict,'good','bad')) %>% 
  gather_set_data(c(1:3))
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
process1TestAlluvial %>% ggplot(aes(x, id = id, split = y, value = value)) +
  geom_parallel_sets(aes(fill = correct), alpha = 0.3, axis.width = 0.1) +
  geom_parallel_sets_axes(axis.width = 0.1) +
  geom_parallel_sets_labels(colour = 'white')

Time Series Prediction

One thing we’re interested in a practical sense is to follow a time series along and see how our estimation of the object changes over time. To do this, we take the longest test sample for each type, and then divide it up into increasingly long parts. We calculate the type at each time stop to see when/how it gets to the right answer.

#select test trial 25 from each of our classes.  We have to go back to the original test data and then build up the dldts and images.  

#label the data with a time block index.
timeBlocks <- seq(from=0,to=100,by=10)

longestTestTrial = maxLengthTestTrial$trial

process1TimeSeriesTestInput <- process1TestDataTable %>% 
  filter(trial == longestTestTrial) %>% select(-trial) %>% 
  mutate(timeBlock = cut(time,breaks = timeBlocks,labels=FALSE))

process1TimeSeriesTestInputExpanded <- 
  1:length(timeBlocks) %>% map(.f=function(blockIndex){
  
  process1TimeSeriesTestInput %>% filter(timeBlock <= blockIndex) %>%
    mutate(trial=blockIndex) %>% bind_rows()
  
}) %>% bind_rows() %>% select(-timeBlock)
numTimeSeriesTrials = length(timeBlocks) - 1;

process1TimeSeriesTestdldtData <-
  convertSamplesToDValDt(process1TimeSeriesTestInputExpanded,typeNames,numTimeSeriesTrials)

process1TimeSeriesTestdldtImages <- 
  createImages(process1TimeSeriesTestdldtData,typeNames,numTimeSeriesTrials,limits,bins)
process1TimeSeriesTestImagesInRows <- widenImageDataFrame(process1TimeSeriesTestdldtImages,
                                                  numPixels)
process1TimeSeriesTestPredictionTable <-
  testKerasModelForProbType(process1TimeSeriesTestImagesInRows,process1KerasModel)
process1TimeSeriesTestPredictionTable %>%
  ggplot(aes(x=trial,y=probability,color=predictedType)) + geom_line() + 
  geom_point() + facet_wrap(~truth)

Fusing Multiple Sources

Now we can move to fusing data from different sources. The DLDT technique is interesting for this, as it allows for a combination of scales in a natural way that concatenating vectors of real functions doesn’t exactly lead to.

We generate some new data

process2TruthTable <- createTruthData(timeVals,typeNames,createProcess2Funcs(numTypes))
process3TruthTable <- createTruthData(timeVals,typeNames,createProcess3Funcs(numTypes))
process2TruthTable %>% ggplot(aes(x=time,y=value,color=func)) + geom_point() + geom_line()

process3TruthTable %>% ggplot(aes(x=time,y=value,color=func)) + geom_point() + geom_line()

We proceed in a similar manner to above by sampling times (we’ll use the same as above for ease), computing the Gaussian process, sampling profiles, computing DLDT images. Afterwhich we will combine data between the processes.

process2Noise = 5.0
process3Noise = 1.0

process2SampledTable <-
  downSampleTruthPoints(process2TruthTable,numToKeep,typeNames,timeVals) %>%
  mutate(noiseValue = value + rnorm(n(),0,process2Noise))

process3SampledTable <-
  downSampleTruthPoints(process3TruthTable,numToKeep,typeNames,timeVals) %>%
  mutate(noiseValue = value + rnorm(n(),0,process3Noise))
process2GPs <- fitGaussianProcesses(typeNames,process2SampledTable)
process3GPs <- fitGaussianProcesses(typeNames,process3SampledTable)
process2TrainingDataTable <- generateSamples(process2GPs,typeNames,maxTimesForBins,numProfilesPerTimeBin,numPointsPerFrofileRanges)
process3TrainingDataTable <- generateSamples(process3GPs,typeNames,maxTimesForBins,numProfilesPerTimeBin,numPointsPerFrofileRanges)
process2DldtTrainingData <- 
  convertSamplesToDValDt(process2TrainingDataTable,typeNames,totalProfilesPerType)

process3DldtTrainingData <- 
  convertSamplesToDValDt(process3TrainingDataTable,typeNames,totalProfilesPerType)
process2TrainingDlDtPlot <- 
  process2DldtTrainingData %>% filter(trial %in% displayTrials) %>%
  ggplot(aes(x=time,y=value,color=type)) + geom_point() + ylim(c(0,150)) + 
  facet_wrap(~type) + transition_states(trial,transition_length=1,state_length=2) +
  labs(title='Trial: {closest_state}')

animate(process2TrainingDlDtPlot,nframes=3*totalProfilesPerType/5)
## Warning: Removed 17 rows containing missing values (geom_point).

## Warning: Removed 17 rows containing missing values (geom_point).

## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 18 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).
## Warning: Removed 22 rows containing missing values (geom_point).

## Warning: Removed 22 rows containing missing values (geom_point).

## Warning: Removed 22 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 47 rows containing missing values (geom_point).

## Warning: Removed 47 rows containing missing values (geom_point).

## Warning: Removed 47 rows containing missing values (geom_point).
## Warning: Removed 88 rows containing missing values (geom_point).

## Warning: Removed 88 rows containing missing values (geom_point).

## Warning: Removed 88 rows containing missing values (geom_point).
## Warning: Removed 72 rows containing missing values (geom_point).

## Warning: Removed 72 rows containing missing values (geom_point).

## Warning: Removed 72 rows containing missing values (geom_point).
## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).
## Warning: Removed 39 rows containing missing values (geom_point).

## Warning: Removed 39 rows containing missing values (geom_point).

## Warning: Removed 39 rows containing missing values (geom_point).
## Warning: Removed 37 rows containing missing values (geom_point).

## Warning: Removed 37 rows containing missing values (geom_point).

## Warning: Removed 37 rows containing missing values (geom_point).
## Warning: Removed 103 rows containing missing values (geom_point).

## Warning: Removed 103 rows containing missing values (geom_point).

## Warning: Removed 103 rows containing missing values (geom_point).
## Warning: Removed 55 rows containing missing values (geom_point).

## Warning: Removed 55 rows containing missing values (geom_point).

## Warning: Removed 55 rows containing missing values (geom_point).
## Warning: Removed 163 rows containing missing values (geom_point).

## Warning: Removed 163 rows containing missing values (geom_point).

## Warning: Removed 163 rows containing missing values (geom_point).
## Warning: Removed 66 rows containing missing values (geom_point).

## Warning: Removed 66 rows containing missing values (geom_point).

## Warning: Removed 66 rows containing missing values (geom_point).
## Warning: Removed 80 rows containing missing values (geom_point).

## Warning: Removed 80 rows containing missing values (geom_point).

## Warning: Removed 80 rows containing missing values (geom_point).
## Warning: Removed 62 rows containing missing values (geom_point).

## Warning: Removed 62 rows containing missing values (geom_point).

## Warning: Removed 62 rows containing missing values (geom_point).
## Warning: Removed 122 rows containing missing values (geom_point).

## Warning: Removed 122 rows containing missing values (geom_point).

## Warning: Removed 122 rows containing missing values (geom_point).
## Warning: Removed 40 rows containing missing values (geom_point).

## Warning: Removed 40 rows containing missing values (geom_point).

## Warning: Removed 40 rows containing missing values (geom_point).
## Warning: Removed 184 rows containing missing values (geom_point).

## Warning: Removed 184 rows containing missing values (geom_point).

## Warning: Removed 184 rows containing missing values (geom_point).
## Warning: Removed 167 rows containing missing values (geom_point).

## Warning: Removed 167 rows containing missing values (geom_point).

## Warning: Removed 167 rows containing missing values (geom_point).
## Warning: Removed 322 rows containing missing values (geom_point).

## Warning: Removed 322 rows containing missing values (geom_point).

## Warning: Removed 322 rows containing missing values (geom_point).
## Warning: Removed 56 rows containing missing values (geom_point).

## Warning: Removed 56 rows containing missing values (geom_point).

## Warning: Removed 56 rows containing missing values (geom_point).
## Warning: Removed 266 rows containing missing values (geom_point).

## Warning: Removed 266 rows containing missing values (geom_point).

## Warning: Removed 266 rows containing missing values (geom_point).
## Warning: Removed 65 rows containing missing values (geom_point).

## Warning: Removed 65 rows containing missing values (geom_point).

## Warning: Removed 65 rows containing missing values (geom_point).
## Warning: Removed 161 rows containing missing values (geom_point).

## Warning: Removed 161 rows containing missing values (geom_point).

## Warning: Removed 161 rows containing missing values (geom_point).
## Warning: Removed 120 rows containing missing values (geom_point).

## Warning: Removed 120 rows containing missing values (geom_point).

## Warning: Removed 120 rows containing missing values (geom_point).
## Warning: Removed 302 rows containing missing values (geom_point).

## Warning: Removed 302 rows containing missing values (geom_point).

## Warning: Removed 302 rows containing missing values (geom_point).

process3TrainingDlDtPlot <- 
  process3DldtTrainingData %>% filter(trial %in% displayTrials) %>%
  ggplot(aes(x=time,y=value,color=type)) + geom_point() + ylim(c(0,30)) + 
  facet_wrap(~type) + transition_states(trial,transition_length=1,state_length=2) +
  labs(title='Trial: {closest_state}')

animate(process3TrainingDlDtPlot,nframes=3*totalProfilesPerType/5)
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing missing values (geom_point).

## Warning: Removed 7 rows containing missing values (geom_point).
## Warning: Removed 63 rows containing missing values (geom_point).

## Warning: Removed 63 rows containing missing values (geom_point).

## Warning: Removed 63 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 77 rows containing missing values (geom_point).

## Warning: Removed 77 rows containing missing values (geom_point).

## Warning: Removed 77 rows containing missing values (geom_point).
## Warning: Removed 136 rows containing missing values (geom_point).

## Warning: Removed 136 rows containing missing values (geom_point).

## Warning: Removed 136 rows containing missing values (geom_point).
## Warning: Removed 94 rows containing missing values (geom_point).

## Warning: Removed 94 rows containing missing values (geom_point).

## Warning: Removed 94 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 219 rows containing missing values (geom_point).

## Warning: Removed 219 rows containing missing values (geom_point).

## Warning: Removed 219 rows containing missing values (geom_point).
## Warning: Removed 128 rows containing missing values (geom_point).

## Warning: Removed 128 rows containing missing values (geom_point).

## Warning: Removed 128 rows containing missing values (geom_point).
## Warning: Removed 221 rows containing missing values (geom_point).

## Warning: Removed 221 rows containing missing values (geom_point).

## Warning: Removed 221 rows containing missing values (geom_point).
## Warning: Removed 336 rows containing missing values (geom_point).

## Warning: Removed 336 rows containing missing values (geom_point).

## Warning: Removed 336 rows containing missing values (geom_point).
## Warning: Removed 253 rows containing missing values (geom_point).

## Warning: Removed 253 rows containing missing values (geom_point).

## Warning: Removed 253 rows containing missing values (geom_point).
## Warning: Removed 56 rows containing missing values (geom_point).

## Warning: Removed 56 rows containing missing values (geom_point).

## Warning: Removed 56 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 412 rows containing missing values (geom_point).

## Warning: Removed 412 rows containing missing values (geom_point).

## Warning: Removed 412 rows containing missing values (geom_point).

process2Limits <- matrix(c(0,0,100,150),2,2)
process3Limits <- matrix(c(0,0,100,30),2,2)
# xPixels = 16
# yPixels = 16
# bins <- c(xPixels,yPixels)
# numPixels = xPixels * yPixels


process2DldtTrainingImages <- createImages(process2DldtTrainingData,typeNames,totalProfilesPerType,process2Limits,bins)

process3DldtTrainingImages <- createImages(process3DldtTrainingData,typeNames,totalProfilesPerType,process3Limits,bins)
process2DldtTrainingImagesPlot <- process2DldtTrainingImages %>% filter(trial %in% displayTrials) %>% ggplot(aes(x,y)) + geom_raster(aes(fill=value)) + facet_wrap(~type) + transition_states(trial,transition_length=1,state_length=2) + labs(title='Trial: {closest_state}')
  
animate(process2DldtTrainingImagesPlot,nframes=3*totalProfilesPerType/5)  

process3DldtTrainingImagesPlot <- process3DldtTrainingImages %>% filter(trial %in% displayTrials) %>% ggplot(aes(x,y)) + geom_raster(aes(fill=value)) + facet_wrap(~type) + transition_states(trial,transition_length=1,state_length=2) + labs(title='Trial: {closest_state}')
  
animate(process3DldtTrainingImagesPlot,nframes=3*totalProfilesPerType/5)  

We can combine these images into RGB frames. We’ll just pick a couple to show rather than animating through.

process2Type1Trial31 = process2DldtTrainingImages %>% filter(type=='truth1',trial==31)
process3Type1Trial31 = process3DldtTrainingImages %>% filter(type=='truth1',trial==31)

type1trial31RGB = process1DldtTrainingImages %>% filter(type=='truth1',trial==31)  %>%
  mutate(value2 = process2Type1Trial31$value,value3 = process3Type1Trial31$value) %>% 
  mutate(value = value/256,value2 = value2/256,value3=value3/256) %>%
  mutate(rgb.val=rgb(value,value2,value3))


process2Type3Trial81 = process2DldtTrainingImages %>% filter(type=='truth3',trial==81)
process3Type3Trial81 = process3DldtTrainingImages %>% filter(type=='truth3',trial==81)

type3trial81RGB = process1DldtTrainingImages %>% filter(type=='truth3',trial==81)  %>%
  mutate(value2 = process2Type3Trial81$value,value3 = process3Type3Trial81$value) %>%
  mutate(value = value/256,value2 = value2/256,value3=value3/256) %>%
  mutate(rgb.val=rgb(value,value2,value3))
type1trial31RGB %>% ggplot(aes(x,y))+geom_raster(aes(fill=rgb.val))+scale_fill_identity()

type3trial81RGB %>% ggplot(aes(x,y))+geom_raster(aes(fill=rgb.val))+scale_fill_identity()

Now we wideng the datatables and send to training.

process2TrainingImagesInRows <- 
  widenImageDataFrame(process2DldtTrainingImages,numPixels) %>% select(-c(trial,type))
process3TrainingImagesInRows <- 
  widenImageDataFrame(process3DldtTrainingImages,numPixels) %>% select(-c(trial,type))

rgbTrainingImagesInRows <-
  bind_cols(process1TrainingImagesInRows,
            process2TrainingImagesInRows,
            process3TrainingImagesInRows)
rgbKerasModel <- trainKerasModel(rgbTrainingImagesInRows)
#We'll test our training data here.
rgbSelfTestPredictionTable <- testKerasModel(rgbTrainingImagesInRows,rgbKerasModel)
rgbSelfTestPredictionTable %>% ggplot(aes(truth)) + geom_bar(aes(fill=correct))

rgbSelfTestPredictionTable %>% ggplot(aes(predict)) + geom_bar(aes(fill=correct))

trialBins <- c(31,61,91,121,151)
rgbSelfTestPredictionTable %<>% mutate(trialBin = case_when(trial<trialBins[1] ~ 1,
                                                         trial<trialBins[2] ~ 2,
                                                         trial<trialBins[3] ~ 3,
                                                         trial<trialBins[4] ~ 4,
                                                         trial<trialBins[5] ~ 5))

rgbSelfTestPredictionTable %>% ggplot(aes(truth)) + geom_bar(aes(fill=correct)) + facet_wrap(~trialBin) 

rgbSelfTestPredictionTable %>% ggplot(aes(predict)) + geom_bar(aes(fill=correct)) + facet_wrap(~trialBin)

Now we do test points for the rgb data.

process2TestDataTable <-
  generateSamples(process2GPs,typeNames,maxTimesForBins,numProfilesPerTimeBin_test,numPointsPerFrofileRanges)

process3TestDataTable <-
  generateSamples(process3GPs,typeNames,maxTimesForBins,numProfilesPerTimeBin_test,numPointsPerFrofileRanges)
## Adding nugget to get sample
process2TestdldtData <-
  convertSamplesToDValDt(process2TestDataTable,typeNames,totalProfilesPerType_test)

process3TestdldtData <-
  convertSamplesToDValDt(process3TestDataTable,typeNames,totalProfilesPerType_test)
process2DldtTestImages <- createImages(process2TestdldtData,
                                       typeNames,
                                       totalProfilesPerType_test,
                                       process2Limits,
                                       bins)

process3DldtTestImages <- createImages(process3TestdldtData,
                                       typeNames,
                                       totalProfilesPerType_test,
                                       process3Limits,
                                       bins)
process2TestImagesInRows <- 
  widenImageDataFrame(process2DldtTestImages,numPixels) %>% select(-c(trial,type))
process3TestImagesInRows <- 
  widenImageDataFrame(process3DldtTestImages,numPixels) %>% select(-c(trial,type))

rgbTestImagesInRows <- bind_cols(process1TestImagesInRows,
                                 process2TestImagesInRows,
                                 process3TestImagesInRows)


rgbTestPredictionTable <- testKerasModel(rgbTestImagesInRows,rgbKerasModel)
rgbTestPredictionTable %>% ggplot(aes(truth)) + geom_bar(aes(fill=correct))

rgbTestPredictionTable %>% ggplot(aes(predict)) + geom_bar(aes(fill=correct))

trialBins <- c(6,11,16,21,26)
rgbTestPredictionTable %<>% mutate(trialBin = case_when(trial<trialBins[1] ~ 1,
                                                         trial<trialBins[2] ~ 2,
                                                         trial<trialBins[3] ~ 3,
                                                         trial<trialBins[4] ~ 4,
                                                         trial<trialBins[5] ~ 5))

rgbTestPredictionTable %>% ggplot(aes(truth)) + geom_bar(aes(fill=correct)) + facet_wrap(~trialBin) 

rgbTestPredictionTable %>% ggplot(aes(predict)) + geom_bar(aes(fill=correct)) + facet_wrap(~trialBin)

Now do the RGB time series.

#select test trial 25 from each of our classes.  We have to go back to the original test data and then build up the dldts and images.  

process2TimeSeriesTestInput <- process2TestDataTable %>% 
  filter(trial == longestTestTrial) %>% select(-trial)

process3TimeSeriesTestInput <- process3TestDataTable %>% 
  filter(trial == longestTestTrial) %>% select(-trial)



process2TimeSeriesTestInput %<>% 
  mutate(timeBlock = cut(time,breaks = timeBlocks,labels=FALSE))
process3TimeSeriesTestInput %<>% 
  mutate(timeBlock = cut(time,breaks = timeBlocks,labels=FALSE))

process2TimeSeriesTestInputExpanded <- 
  1:length(timeBlocks) %>% map(.f=function(blockIndex){
    
    process2TimeSeriesTestInput %>% filter(timeBlock <= blockIndex) %>% 
      mutate(trial=blockIndex) %>% bind_rows()
  
}) %>% bind_rows() %>% select(-timeBlock)

process3TimeSeriesTestInputExpanded <- 
  1:length(timeBlocks) %>% map(.f=function(blockIndex){
    
    process3TimeSeriesTestInput %>% filter(timeBlock <= blockIndex) %>% 
      mutate(trial=blockIndex) %>% bind_rows()
  
}) %>% bind_rows() %>% select(-timeBlock)
process2TimeSeriesTestdldtData <-
  convertSamplesToDValDt(process2TimeSeriesTestInputExpanded,typeNames,numTimeSeriesTrials)
process3TimeSeriesTestdldtData <-
  convertSamplesToDValDt(process3TimeSeriesTestInputExpanded,typeNames,numTimeSeriesTrials)

process2TimeSeriesTestdldtImages <- 
  createImages(process2TimeSeriesTestdldtData,typeNames,numTimeSeriesTrials,process2Limits,bins)
process3TimeSeriesTestdldtImages <- 
  createImages(process3TimeSeriesTestdldtData,typeNames,numTimeSeriesTrials,process3Limits,bins)
process2TimeSeriesTestImagesInRows <-
  widenImageDataFrame(process2TimeSeriesTestdldtImages,numPixels) %>%
  select(-c(trial,type))

process3TimeSeriesTestImagesInRows <-
  widenImageDataFrame(process3TimeSeriesTestdldtImages,numPixels) %>%
  select(-c(trial,type))


rgbTimeSeriesTestImagesInRows <- bind_cols(process1TimeSeriesTestImagesInRows,
                                           process2TimeSeriesTestImagesInRows,
                                           process3TimeSeriesTestImagesInRows)


rgbTimeSeriesTestPredictionTable <-
  testKerasModelForProbType(rgbTimeSeriesTestImagesInRows,rgbKerasModel)
rgbTimeSeriesTestPredictionTable %>% ggplot(aes(x=trial,y=probability,color=predictedType)) + geom_line() + geom_point() + facet_wrap(~truth)